Menu Top
Non-Rationalised Geography NCERT Notes, Solutions and Extra Q & A (Class 6th to 12th)
6th 7th 8th 9th 10th 11th 12th

Class 12th Chapters
Fundamentals of Human Geography
1. Human Geography Nature And Scope 2. The World Population Distribution, Density And Growth 3. Population Composition
4. Human Development 5. Primary Activities 6. Secondary Activities
7. Tertiary And Quaternary Activities 8. Transport And Communication 9. International Trade
10. Human Settlements
India - People and Economy
1. Population : Distribution, Density, Growth And Composition 2. Migration : Types, Causes And Consequences 3. Human Development
4. Human Settlements 5. Land Resources And Agriculture 6. Water Resources
7. Mineral And Energy Resources 8. Manufacturing Industries 9. Planning And Sustainable Development In Indian Context
10. Transport And Communication 11. International Trade 12. Geographical Perspective On Selected Issues And Problems
Practical Work in Geography
1. Data – Its Source And Compilation 2. Data Processing 3. Graphical Representation Of Data
4. Use Of Computer In Data Processing And Mapping 5. Field Surveys 6. Spatial Information Technology



Chapter 1 Data–Its Source and Compilation



In our daily lives and in various forms of media like television news or geographical books, we frequently encounter numerical information. This information represents measurements or counts from the real world and is known as data.

A single measurement is called a datum.

For example, figures like rainfall amounts (20 cm, 35 cm) or distances between cities (1385 km, 1542 km) are all considered data.

While there is an immense amount of data available today, deriving meaningful insights or conclusions from it can be challenging if it remains in its unprocessed, original form (raw data). Therefore, it is important that measured information is derived, deduced, or calculated logically and/or statistically from multiple data points to become useful.

When data is organized and processed to provide a meaningful answer to a question or to stimulate further inquiry, it becomes information.


Need of Data

In geography, maps are fundamental tools for understanding spatial distributions. However, data presented in tables or used for statistical analysis is equally important for explaining phenomena like population growth, distribution patterns, or the flow of goods.

Geographical phenomena often interact with each other across the Earth's surface. These interactions are influenced by many factors or variables. To understand these relationships precisely, especially their quantitative aspects, analyzing relevant data using statistical methods has become essential.

For instance, to study agricultural patterns in an area, one needs quantitative data on factors like the extent of cropped land, crop yields, total production, irrigated area, rainfall amounts, and inputs used (fertilizers, pesticides).

Similarly, analyzing the growth and characteristics of a city requires data on its total population, population density, migration figures, occupational structure, income levels, industries, and transport/communication infrastructure.

Thus, data is indispensable for conducting thorough geographical analysis.


Presentation of the Data

Simply collecting data is not enough; how it is presented and analyzed is equally crucial. Misinterpreting raw data or relying solely on simple averages without considering the distribution can lead to misleading conclusions (a statistical fallacy), potentially deviating from the actual situation.

Statistical methods are widely used today in almost all fields, including geography, for analyzing, presenting, and interpreting data to draw sound conclusions.

Quantitative analysis is increasingly preferred over purely qualitative descriptions to explain relationships between geographical variables. This shift necessitates the use of analytical tools and techniques for collecting, compiling, tabulating, organizing, and analyzing data to arrive at logical and precise findings.


Sources of Data

Data can be obtained from different origins, which are broadly categorized into two main types:

1. Primary Sources: Data collected for the very first time by the individual or organization conducting the research.

2. Secondary Sources: Data that has already been collected, processed, or published by another individual or organization, and is being used by a different researcher.

Fig. 1.1 (refer to diagram in text) illustrates the various methods used for collecting both primary and secondary data.

Diagram showing primary and secondary methods of data collection

Sources of Primary Data

Primary data is collected directly from the source using several methods:


1. Personal Observations: This involves collecting information by directly observing phenomena in the field. Through a field survey, one can gather data on physical features (relief, drainage, soil types, vegetation), demographic characteristics (population structure, sex ratio, literacy), infrastructure (transport, communication), and settlement patterns (rural, urban). This method requires the observer to have relevant theoretical knowledge and an objective, unbiased approach.


2. Interview: The researcher obtains information directly from individuals (respondents) through verbal interaction, conversation, or dialogue. Key considerations for conducting effective interviews include preparing a clear list of questions, having a clear objective, building rapport with respondents, ensuring privacy for sensitive information, using simple and polite language, avoiding offensive questions, and asking for additional information.


3. Questionnaire/Schedule: These involve a set of written questions used to collect data. A questionnaire is filled out by the respondent themselves, often by selecting from pre-provided answers or writing brief responses. It is useful for covering large areas and can be mailed. A limitation is that it is only suitable for literate respondents. A schedule is similar but filled out by a trained enumerator who asks the questions verbally to the respondent. This method allows data collection from both literate and illiterate individuals.


4. Other Methods: Direct measurement using specialized tools can also be a source of primary data. For example, collecting data on soil or water properties using testing kits, or measuring crop health using transducers.

Field scientist measuring crop health with an instrument

Secondary Source of Data

Secondary data is obtained from existing records or publications. These can be published or unpublished.


Published Sources:


Unpublished Sources:


Tabulation and Classification of Data

Raw data, whether from primary or secondary sources, is initially unorganized and difficult to comprehend. To make it usable and derive meaningful inferences, it needs to be processed through tabulation and classification.

A Statistical Table is a simple and effective way to summarize and present data. It involves arranging data systematically in rows and columns. The purpose is to simplify presentation, facilitate comparisons between different data points, and allow readers to quickly find specific information.

Tables enable analysts to organize large volumes of data in a structured manner within a limited space.


Data Compilation and Presentation

Data is typically collected, organized, and presented in tables in different formats:


Absolute Data

When data is presented in its original, unprocessed numerical form (as whole numbers or integers), it is called absolute data or raw data. Examples include the total population count of a country or state, or the total production volume of a crop or industry.

Table 1.1 shows the absolute population figures for India and selected states/UTs based on the 2011 Census.

State/UT Code India/State/Union Territory Persons Males Females
Total Population
INDIA 1,21,05,69,573 62,31,21,843 58,74,47,730
1 Jammu and Kashmir 1,25,41,302 66,40,662 59,00,640
2 Himachal Pradesh 68,64,602 34,81,873 33,82,729
3 Punjab 2,77,43,338 1,46,39,465 1,31,03,873
4 Chandigarh 10,55,450 5,80,663 4,74,787
5 Uttarakhand 1,00,86,292 51,37,773 49,48,519
6 Haryana 2,53,51,462 1,34,94,734 1,18,56,728
7 National Capital Territory of Delhi 1,67,87,941 89,87,326 78,00,615
8 Rajasthan 6,85,48,437 3,55,50,997 3,29,97,440
9 Uttar Pradesh 19,98,12,341 10,44,80,510 9,53,31,831
10 Bihar 10,40,99,452 5,42,78,157 4,98,21,295

Percentage/Ratio

Data can also be presented as percentages or ratios, which are calculated based on a common parameter. This format is useful for comparison and analysis of trends or proportions.

Examples include calculating literacy rates, population growth rates, or the percentage share of different sectors in agricultural or industrial production.

Table 1.2 shows the literacy rates in India over several decades, presented as percentages. The literacy rate is calculated using the formula:

$ \text{Literacy Rate} = \frac{\text{Total Number of Literates}}{\text{Total Population}} \times 100 $

Year Person (%) Male (%) Female (%)
1951 18.33 27.16 8.86
1961 28.3 40.4 15.35
1971 34.45 45.96 21.97
1981 43.57 56.38 29.76
1991 52.21 64.13 39.29
2001 64.84 75.85 54.16
2011 73.0 80.9 64.6

Index Number

An index number is a statistical tool used to show changes in a variable or a group of related variables relative to a base period or location. It measures relative changes, not absolute ones.

Index numbers are widely used, particularly in economics and business (e.g., tracking price changes with the Consumer Price Index), but can also compare conditions across different places or industries.

A common method for calculation is the simple aggregate method. It is calculated as:

$ \text{Index Number} = \frac{\sum q_1}{\sum q_0} \times 100 $

Where:

The base period value is typically set to 100, and the index number for other periods is calculated relative to this base.

Table 1.3 illustrates the production of iron ore in India and the calculation of an index number, taking 1970-71 as the base year (Index = 100).

Year Production (in million tonnes) Calculation Index Number (Base 1970-71=100)
1970-71 32.5 $\frac{32.5}{32.5} \times 100$ 100
1980-81 42.2 $\frac{42.2}{32.5} \times 100$ 130
1990-91 53.7 $\frac{53.7}{32.5} \times 100$ 165
2000-01 67.4 $\frac{67.4}{32.5} \times 100$ 207

Processing of Data

Once collected, raw data needs to be processed to be understood and analyzed. This involves tabulating the data and classifying it into meaningful categories or groups.

For instance, if you have a list of individual scores for 60 students (like in Table 1.4 in the text), this raw data is difficult to interpret directly.

The first step in processing such ungrouped raw data is to group it into classes. This reduces the volume of data and makes it easier to identify patterns and summarize information.

Grouping of Data

Grouping data involves deciding the number of classes or groups to create and the range of values within each class (the class interval). The choice depends on the overall range of the raw data (the difference between the highest and lowest values).

For example, if scores range from 02 to 96, you could decide to create 10 classes with an interval of 10 units each (e.g., 0-10, 10-20, 20-30, ..., 90-100).

Process of Classification

After determining the classes and intervals, the raw data is classified by assigning each individual observation to the appropriate class. A common method for this is the Four and Cross Method (or tally marks).

For each data point, a tally mark is placed in the corresponding class. Tally marks are grouped in fives (four vertical lines crossed by a diagonal line) for easy counting. For example, if a score is 47, a tally mark is added to the 40-50 class.

Frequency Distribution

Once the data is classified into groups using tally marks, the total count of tally marks in each group gives the number of individuals (or observations) falling into that class. This count is called the frequency for that class.

A table showing the classes and their corresponding frequencies is called a frequency distribution. It illustrates how the values of a variable are distributed across different ranges.

Frequencies are presented as either Simple Frequencies or Cumulative Frequencies.


Simple Frequencies: Represented by 'f', this is the count of observations strictly within each specific class or group (as obtained from the tally marks). The sum of all simple frequencies ($\sum f$) equals the total number of observations (N).

Cumulative Frequencies: Represented by 'Cf', these are obtained by successively adding the simple frequencies. The cumulative frequency for a class is the sum of its simple frequency and the cumulative frequency of the preceding class. The cumulative frequency for the last class should equal the total number of observations (N).

Cumulative frequencies help in quickly determining the number of observations below or above a certain value.


When forming classes, especially for quantitative data, two common methods are used:

Exclusive Method

In this method, the upper limit of a class is the same as the lower limit of the next class (e.g., 0-10, 10-20, 20-30). However, observations equal to the upper limit are *excluded* from that class and included in the *next* class where they are the lower limit. For example, a value of 30 would be included in the 30-40 class, not the 20-30 class. This ensures that each observation falls into only one class.

Table 1.6 shows frequency distribution using the exclusive method.

Group f Cf
00-10 4 4
10-20 5 9
20-30 5 14
30-40 7 21
40-50 6 27
50-60 10 37
60-70 8 45
70-80 6 51
80-90 5 56
90-100 4 60
$\sum f$ N = 60

Inclusive Method

In this method, the upper limit of a class is *included* within that same class (e.g., 0-9, 10-19, 20-29). The upper limit of one class is usually one less than the lower limit of the next class. This method ensures that each observation falls into only one class and that both the lower and upper boundaries define the values included.

Table 1.7 shows frequency distribution using the inclusive method.

Group f Cf
0 – 9 4 4
10 – 19 5 9
20 – 29 5 14
30 – 39 7 21
40 – 49 6 27
50 – 59 10 37
60 – 69 8 45
70 – 79 6 51
80 – 89 5 56
90 – 99 4 60
$\sum f$ N = 60

Frequency Polygon

A frequency polygon is a line graph that visually represents a frequency distribution. It is created by plotting points at the midpoints of each class interval on the x-axis and their corresponding frequencies on the y-axis, and then connecting these points with straight lines. It is useful for visualizing the shape of a distribution and comparing multiple distributions.

Graph showing a frequency polygon plotted from a frequency distribution

Ogive

An Ogive (pronounced 'ojive') is a graphical representation of a cumulative frequency distribution. It shows the cumulative frequency plotted against the upper or lower boundaries of the class intervals.

Ogives are constructed using either the 'less than' or the 'more than' method.

Both the 'less than' and 'more than' ogives can be plotted on the same graph (Fig. 1.8). The intersection point of the two ogives represents the median of the distribution.

Table 1.10 combines the data for both less than and more than methods for plotting a comparative ogive.

Marks obtained Less than More than
0 - 10 4 60
10 - 20 9 56
20 - 30 14 51
30 - 40 21 44
30 - 40 27 38
50 - 60 37 28
60 - 70 45 20
70 - 80 51 14
80 - 90 56 9
90 - 100 60 4
Graph showing combined 'less than' and 'more than' ogives

Excercises

This section contains questions and activities designed to reinforce understanding of the concepts of data, its sources, compilation, processing, and presentation discussed in the chapter.